Des Moines
Blending 3D Geometry and Machine Learning for Multi-View Stereopsis
Vats, Vibhas, Reza, Md. Alimoor, Crandall, David, Jung, Soon-heung
Traditional multi-view stereo (MVS) methods primarily depend on photometric and geometric consistency constraints. In contrast, modern learning-based algorithms often rely on the plane sweep algorithm to infer 3D geometry, applying explicit geometric consistency (GC) checks only as a post-processing step, with no impact on the learning process itself. In this work, we introduce GC MVSNet plus plus, a novel approach that actively enforces geometric consistency of reference view depth maps across multiple source views (multi view) and at various scales (multi scale) during the learning phase (see Fig. 1). This integrated GC check significantly accelerates the learning process by directly penalizing geometrically inconsistent pixels, effectively halving the number of training iterations compared to other MVS methods. Furthermore, we introduce a densely connected cost regularization network with two distinct block designs simple and feature dense optimized to harness dense feature connections for enhanced regularization. Extensive experiments demonstrate that our approach achieves a new state of the art on the DTU and BlendedMVS datasets and secures second place on the Tanks and Temples benchmark. To our knowledge, GC MVSNet plus plus is the first method to enforce multi-view, multi-scale supervised geometric consistency during learning. Our code is available.
Exploring psychophysiological methods for human-robot collaboration in construction
Wong, Saika, Chen, Zhentao, Pan, Mi, Skibniewski, Miroslaw J.
Human-robot collaboration (HRC) refers to scenarios Various psychophysiological-based methods have in which humans and robots work collaboratively toward a been employed to interpret psychological phenomena within common goal, sharing tasks and responsibilities in a way the context of HRC by measuring the brain and physiological that capitalizes on the strengths of both parties [3]. As activity of workers, such as electroencephalography construction tasks become increasingly complex and timesensitive, (EEG) for brain activity [73], photoplethysmography (PPG), the integration of collaborative robots, or cobots, electrocardiography (ECG) for cardiac activity [7], and into the construction industry has emerged as a solution to electrodermal activity (EDA) for skin response [8]. Given all enhance efficiency and simultaneously mitigate operational the merits of these technologies, some initial endeavors on risks [86, 90]. However, real-world deployment of HRC psychophysiological methods for HRC in construction have in construction confronts multifaceted challenges, such as been made. For instance, real-time feedback from individual's trust in robotic capabilities [21], frequent reconfigurations physiological responses [21] and cognitive load [50] of working conditions [43], and communication in noisy has been used to allow cobots to adjust their behavior (e.g., and unstructured environments [24]. These challenges are accelerate, stop, slow down) in response to the changing exacerbated by the reliability and safety issues inherent in workers' conditions. However, studies on wearable-based complicated and dynamic construction activities and environments psychophysiological methods for the construction industry (e.g., human dynamics, non-deterministic features, to date are still limited and embryonic, primarily focusing and the presence of various materials) [49, 50]. To address on interpreting a specific dimension of worker status. While these limitations, the development of HRC is shifting these methods hold promise for advancing human-centric from performance-oriented approaches to human-centrality robot collaboration in construction, their potential has not yet paradigms, emphasizing a comprehensive interpretation of been fully explored, and current applications remain largely collaborative behaviors between humans and their robot experimental.
Logic-RAG: Augmenting Large Multimodal Models with Visual-Spatial Knowledge for Road Scene Understanding
Kabir, Imran, Reza, Md Alimoor, Billah, Syed
Large multimodal models (LMMs) are increasingly integrated into autonomous driving systems for user interaction. However, their limitations in fine-grained spatial reasoning pose challenges for system interpretability and user trust. We introduce Logic-RAG, a novel Retrieval-Augmented Generation (RAG) framework that improves LMMs' spatial understanding in driving scenarios. Logic-RAG constructs a dynamic knowledge base (KB) about object-object relationships in first-order logic (FOL) using a perception module, a query-to-logic embedder, and a logical inference engine. We evaluated Logic-RAG on visual-spatial queries using both synthetic and real-world driving videos. When using popular LMMs (GPT-4V, Claude 3.5) as proxies for an autonomous driving system, these models achieved only 55% accuracy on synthetic driving scenes and under 75% on real-world driving scenes. Augmenting them with Logic-RAG increased their accuracies to over 80% and 90%, respectively. An ablation study showed that even without logical inference, the fact-based context constructed by Logic-RAG alone improved accuracy by 15%. Logic-RAG is extensible: it allows seamless replacement of individual components with improved versions and enables domain experts to compose new knowledge in both FOL and natural language. In sum, Logic-RAG addresses critical spatial reasoning deficiencies in LMMs for autonomous driving applications. Code and data are available at https://github.com/Imran2205/LogicRAG.
Data-driven Super-Resolution of Flood Inundation Maps using Synthetic Simulations
Aravamudan, Akshay, Rasheed, Zimeena, Zhang, Xi, Scarpignato, Kira E., Nikolopoulos, Efthymios I., Krajewski, Witold F., Anagnostopoulos, Georgios C.
The frequency of extreme flood events is increasing throughout the world. Daily, high-resolution (30m) Flood Inundation Maps (FIM) observed from space play a key role in informing mitigation and preparedness efforts to counter these extreme events. However, the temporal frequency of publicly available high-resolution FIMs, e.g., from Landsat, is at the order of two weeks thus limiting the effective monitoring of flood inundation dynamics. Conversely, global, low-resolution (~300m) Water Fraction Maps (WFM) are publicly available from NOAA VIIRS daily. Motivated by the recent successes of deep learning methods for single image super-resolution, we explore the effectiveness and limitations of similar data-driven approaches to downscaling low-resolution WFMs to high-resolution FIMs. To overcome the scarcity of high-resolution FIMs, we train our models with high-quality synthetic data obtained through physics-based simulations. We evaluate our models on real-world data from flood events in the state of Iowa. The study indicates that data-driven approaches exhibit superior reconstruction accuracy over non-data-driven alternatives and that the use of synthetic data is a viable proxy for training purposes. Additionally, we show that our trained models can exhibit superior zero-shot performance when transferred to regions with hydroclimatological similarity to the U.S. Midwest.
ErgoChat: a Visual Query System for the Ergonomic Risk Assessment of Construction Workers
Fan, Chao, Mei, Qipei, Wang, Xiaonan, Li, Xinming
In the construction sector, workers often endure prolonged periods of high-intensity physical work and prolonged use of tools, resulting in injuries and illnesses primarily linked to postural ergonomic risks, a longstanding predominant health concern. To mitigate these risks, researchers have applied various technological methods to identify the ergonomic risks that construction workers face. However, traditional ergonomic risk assessment (ERA) techniques do not offer interactive feedback. The rapidly developing vision-language models (VLMs), capable of generating textual descriptions or answering questions about ergonomic risks based on image inputs, have not yet received widespread attention. This research introduces an interactive visual query system tailored to assess the postural ergonomic risks of construction workers. The system's capabilities include visual question answering (VQA), which responds to visual queries regarding workers' exposure to postural ergonomic risks, and image captioning (IC), which generates textual descriptions of these risks from images. Additionally, this study proposes a dataset designed for training and testing such methodologies. Systematic testing indicates that the VQA functionality delivers an accuracy of 96.5%. Moreover, evaluations using nine metrics for IC and assessments from human experts indicate that the proposed approach surpasses the performance of a method using the same architecture trained solely on generic datasets. This study sets a new direction for future developments in interactive ERA using generative artificial intelligence (AI) technologies. Keywords: Generative Artificial Intelligence; Vision-Language Model; Large language model; Ergonomic Risk Assessment; Construction Safety 1 Introduction Prompt and effective identification and mitigation of workplace hazards are essential for maintaining safety, health, and productivity within the work environment. In the construction industry, workers are often subject to conditions that require awkward body postures, repetitive motions, and intense physical effort, which can detrimentally impact their health [1]. Such conditions in construction tasks usually lead to the emergence of work-related musculoskeletal disorders (WMSDs). Statistics from the United States Bureau of Labor Statistics show that the construction industry's injuries and illnesses caused by WMSDs ranked fifth among all industries. Moreover, in the same year, WMSDs represented 30% of all occupational injuries and illnesses [1]. According to the Association of Workers' Compensation Boards of Canada, the manufacturing and construction sectors reported the second and third-highest rates of losttime injury claims in 2021, representing 13.6% and 10.4% of claims, respectively [2]. European Agency for Safety and Health at Work indicated that the construction and manufacturing sectors reported the highest sick leave rates due to WMSDs [3].
Autonomous Building Cyber-Physical Systems Using Decentralized Autonomous Organizations, Digital Twins, and Large Language Model
Ly, Reachsak, Shojaei, Alireza
Current autonomous building research primarily focuses on energy efficiency and automation. While traditional artificial intelligence has advanced autonomous building research, it often relies on predefined rules and struggles to adapt to complex, evolving building operations. Moreover, the centralized organizational structures of facilities management hinder transparency in decision-making, limiting true building autonomy. Research on decentralized governance and adaptive building infrastructure, which could overcome these challenges, remains relatively unexplored. This paper addresses these limitations by introducing a novel Decentralized Autonomous Building Cyber-Physical System framework that integrates Decentralized Autonomous Organizations, Large Language Models, and digital twins to create a smart, self-managed, operational, and financially autonomous building infrastructure. This study develops a full-stack decentralized application to facilitate decentralized governance of building infrastructure. An LLM-based artificial intelligence assistant is developed to provide intuitive human-building interaction for blockchain and building operation management-related tasks and enable autonomous building operation. Six real-world scenarios were tested to evaluate the autonomous building system's workability, including building revenue and expense management, AI-assisted facility control, and autonomous adjustment of building systems. Results indicate that the prototype successfully executes these operations, confirming the framework's suitability for developing building infrastructure with decentralized governance and autonomous operation.
Democratizing Signal Processing and Machine Learning: Math Learning Equity for Elementary and Middle School Students
Vaswani, Namrata, Selim, Mohamed Y., Gibert, Renee Serrell
Signal Processing (SP) and Machine Learning (ML) rely on good math and coding knowledge, in particular, linear algebra, probability, and complex numbers. A good grasp of these relies on scalar algebra learned in middle school. The ability to understand and use scalar algebra well, in turn, relies on a good foundation in basic arithmetic. Because of various systemic barriers, many students are not able to build a strong foundation in arithmetic in elementary school. This leads them to struggle with algebra and everything after that. Since math learning is cumulative, the gap between those without a strong early foundation and everyone else keeps increasing over the school years and becomes difficult to fill in college. In this article we discuss how SP faculty and graduate students can play an important role in starting, and participating in, university-run (or other) out-of-school math support programs to supplement students' learning. Two example programs run by the authors (CyMath at ISU and Ab7G at Purdue) are briefly described. The second goal of this article is to use our perspective as SP, and engineering, educators who have seen the long-term impact of elementary school math teaching policies, to provide some simple almost zero cost suggestions that elementary schools could adopt to improve math learning: (i) more math practice in school, (ii) send small amounts of homework (individual work is critical in math), and (iii) parent awareness (math resources, need for early math foundation, clear in-school test information and sharing of feedback from the tests). In summary, good early math support (in school and through out-of-school programs) can help make SP and ML more accessible.
X-ray Fluoroscopy Guided Localization and Steering of Medical Microrobots through Virtual Enhancement
Alabay, Husnu Halid, Le, Tuan-Anh, Ceylan, Hakan
In developing medical interventions using untethered milli- and microrobots, ensuring safety and effectiveness relies on robust methods for detection, real-time tracking, and precise localization within the body. However, the inherent non-transparency of the human body poses a significant obstacle, limiting robot detection primarily to specialized imaging systems such as X-ray fluoroscopy, which often lack crucial anatomical details. Consequently, the robot operator (human or machine) would encounter severe challenges in accurately determining the location of the robot and steering its motion. This study explores the feasibility of circumventing this challenge by creating a simulation environment that contains the precise digital replica (virtual twin) of a model microrobot operational workspace. Synchronizing coordinate systems between the virtual and real worlds and continuously integrating microrobot position data from the image stream into the virtual twin allows the microrobot operator to control navigation in the virtual world. We validate this concept by demonstrating the tracking and steering of a mobile magnetic robot in confined phantoms with high temporal resolution (< 100 ms, with an average of ~20 ms) visual feedback. Additionally, our object detection-based localization approach offers the potential to reduce overall patient exposure to X-ray doses during continuous microrobot tracking without compromising tracking accuracy. Ultimately, we address a critical gap in developing image-guided remote interventions with untethered medical microrobots, particularly for near-future applications in animal models and human patients.
Scaling the Vocabulary of Non-autoregressive Models for Efficient Generative Retrieval
Valluri, Ravisri, Mohankumar, Akash Kumar, Dave, Kushal, Singh, Amit, Jiao, Jian, Varma, Manik, Sinha, Gaurav
Generative Retrieval introduces a new approach to Information Retrieval by reframing it as a constrained generation task, leveraging recent advancements in Autoregressive (AR) language models. However, AR-based Generative Retrieval methods suffer from high inference latency and cost compared to traditional dense retrieval techniques, limiting their practical applicability. This paper investigates fully Non-autoregressive (NAR) language models as a more efficient alternative for generative retrieval. While standard NAR models alleviate latency and cost concerns, they exhibit a significant drop in retrieval performance (compared to AR models) due to their inability to capture dependencies between target tokens. To address this, we question the conventional choice of limiting the target token space to solely words or sub-words. We propose PIXAR, a novel approach that expands the target vocabulary of NAR models to include multi-word entities and common phrases (up to 5 million tokens), thereby reducing token dependencies. PIXAR employs inference optimization strategies to maintain low inference latency despite the significantly larger vocabulary. Our results demonstrate that PIXAR achieves a relative improvement of 31.0% in MRR@10 on MS MARCO and 23.2% in Hits@5 on Natural Questions compared to standard NAR models with similar latency and cost.
Smart Textile-Driven Soft Spine Exosuit for Lifting Tasks in Industrial Applications
Zhu, Kefan, Sharma, Bibhu, Phan, Phuoc Thien, Davies, James, Thai, Mai Thanh, Hoang, Trung Thien, Nguyen, Chi Cong, Ji, Adrienne, Nicotra, Emanuele, Lovell, Nigel H., Do, Thanh Nho
Work related musculoskeletal disorders (WMSDs) are often caused by repetitive lifting, making them a significant concern in occupational health. Although wearable assist devices have become the norm for mitigating the risk of back pain, most spinal assist devices still possess a partially rigid structure that impacts the user comfort and flexibility. This paper addresses this issue by presenting a smart textile actuated spine assistance robotic exosuit (SARE), which can conform to the back seamlessly without impeding the user movement and is incredibly lightweight. The SARE can assist the human erector spinae to complete any action with virtually infinite degrees of freedom. To detect the strain on the spine and to control the smart textile automatically, a soft knitting sensor which utilizes fluid pressure as sensing element is used. The new device is validated experimentally with human subjects where it reduces peak electromyography (EMG) signals of lumbar erector spinae by around 32 percent in loaded and around 22 percent in unloaded conditions. Moreover, the integrated EMG decreased by around 24.2 percent under loaded condition and around 23.6 percent under unloaded condition. In summary, the artificial muscle wearable device represents an anatomical solution to reduce the risk of muscle strain, metabolic energy cost and back pain associated with repetitive lifting tasks.